Skip to content

Fix information leakage of confidential features in Search API#6276

Merged
DanielRyanSmith merged 7 commits intomainfrom
fix-search-info-leak
Apr 29, 2026
Merged

Fix information leakage of confidential features in Search API#6276
DanielRyanSmith merged 7 commits intomainfrom
fix-search-info-leak

Conversation

@DanielRyanSmith
Copy link
Copy Markdown
Collaborator

@DanielRyanSmith DanielRyanSmith commented Apr 16, 2026

Fixes an information leakage vulnerability in the Search API where total_count was calculated before filtering out confidential features.

Root Cause / Motivation

During a security review, it was discovered that the total_count property of the Search API response was calculating the total number of features matching a query across the entire datastore before applying confidentiality filtering on the current page. This could be abused to leak the existence or total number of confidential features matching a specific query by an unprivileged user.

To fix this, internals/search.py now leverages a parallel datastore query to identify all confidential features matching the query, and filters out the ones the user doesn't have explicit permission to view from the result_id_set before the total_count is evaluated and the results are paginated.

Optimizations

  1. Datastore Query Bypass: The parallel datastore query for confidential features and the subsequent filtering logic are now completely bypassed if the requesting user has global permissions to view all features (e.g., internal Google/Chromium accounts or site administrators). This prevents unnecessary database reads and processing.
  2. Cache Partitioning: The Redis search cache is no longer fragmented by individual user emails. To preserve high cache hit rates and reduce memory bloat, the cache is partitioned by access level (global_access, no_access, or the user's email if they participate in any features). This ensures the majority of users safely share the same cached results.

Detailed Changelog

  • internals/search.py:
    • Added confidential_future_ops to parallelize finding confidential features that matched the user's query. Those results are dynamically cross-checked with the user's view permissions, and explicitly stripped out of result_id_set prior to pagination and total count length check.
    • Optimized the search flow to skip confidential_future_ops and intersection filtering if is_google_or_chromium_account(user) or can_admin_site(user) is true.
    • Updated make_cache_key to partition cache entries by access level (global_access, no_access, or user email) rather than partitioning strictly by user email, restoring search cache performance.
  • internals/search_test.py:
    • Updated test_make_cache_key assertions to use the expected no_access partition string instead of anonymous.

@DanielRyanSmith DanielRyanSmith force-pushed the fix-search-info-leak branch 2 times, most recently from ed435f7 to b8dc38f Compare April 16, 2026 19:22
@DanielRyanSmith
Copy link
Copy Markdown
Collaborator Author

CodeQL seems to be upset with existing code

@DanielRyanSmith
Copy link
Copy Markdown
Collaborator Author

I'll fix the CodeQL issues in this PR as well if possible

Comment thread internals/search.py Fixed
Comment thread internals/search.py Fixed
Comment thread internals/search.py Fixed
This commit addresses two major issues introduced in the previous fix for
the search API information leakage:

1. Information Leakage via Cache: Caching is enabled for `name_only=True`
queries, but the cache key didn't partition based on the current user. Thus,
if an authorized user cached a confidential result, an unauthorized user
could retrieve it from the cache. The cache key has been updated to include
the user's email to securely partition the results.

2. Unscalable Model Fetching: The previous implementation used `ndb.get_multi`
to fetch all matching confidential `FeatureEntry` models into memory to
perform permission filtering. This has been optimized to evaluate permissions
via efficient integer ID set subtraction using existing participant cache
keys and admin checks, completely avoiding the expensive `ndb.get_multi` call.
Bypasses the Datastore query for confidential features if the user has global permission to view all confidential features. This saves resources by avoiding unnecessary database reads and processing for internal Google/Chromium users and site administrators.
Instead of partitioning the search cache by individual user emails, which causes severe cache fragmentation and degrades performance, the cache is now partitioned by access level (`global_access`, `no_access`, or the user's email if they participate in any features). This ensures that the majority of regular users and unauthenticated users share the same cache, improving hit rates and preserving Search API performance.
@DanielRyanSmith DanielRyanSmith added this pull request to the merge queue Apr 29, 2026
Merged via the queue into main with commit 5668d46 Apr 29, 2026
8 checks passed
@DanielRyanSmith DanielRyanSmith deleted the fix-search-info-leak branch April 29, 2026 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants